Search Results
VATT 논문 리뷰 (Transformers for Multimodal Self-Supervsied Learning from Raw Video, Audio and Text)
PR-314: VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio, and Text
Transformers for Multimodal Self Supervised Learning from Raw Video, Audio and Text | NeurIPS 2021
[Paper Review] Multimodal Learning With Transformers: A Survey
[Paper Review] Multimodal Transformer for Unaligned Multimodal Language Sequences
[SUB] multimodal transformer for unaligned multimodal language sequences
[SUB] Switch Transformers Paper review!
PR-334: CMT: Convolutional Neural Networks Meet Vision Transformers
[Open DMQA Seminar] Multimodal Learning
PR-318: Emerging Properties in Self-Supervised Vision Transformers
RS-024: data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
[논문미식회] CV312: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale